智能论文笔记

Adapting to game trees in zero-sum imperfect information games

Côme Fiegel , Pierre Ménard , Tadashi Kozuno , Rémi Munos , Vianney Perchet , Michal Valko

分类： (统计)机器学习 | 机器学习

2022-12-23

Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularize leader (FTRL) algorithms for this setting: Balanced-FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive-FTRL which needs $\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ plays without this requirement by progressively adapting the regularization to the observations.

translated by 谷歌翻译

这项工作提出了一种新的动力学的新运动学，该动力学与单个执行器有关，可以实现三方握力，也可以实现侧向握力。受三位生假体的启发，比多物质假体更简单，更健壮和便宜，这种新的运动学旨在提出可访问的假体（负担得起的，易于使用，易于使用，健壮，易于修复）。使用电缆代替刚性杆来传递上指和拇指的动作。本文详细介绍了方法和设计选择。总而言之，通过实验用户对原型的评估导致对结果的首次讨论。

translated by 谷歌翻译